fix: report correct reason in kube_pod_status_reason metric #2644

carlosmorenokm1 · 2025-04-01T03:58:08Z

What this PR does / why we need it:
This PR updates the logic for generating the kube_pod_status_reason metric. Instead of only checking p.Status.Reason, the new implementation also verifies the pod conditions and the termination reasons of container statuses. This change fixes an issue where the metric always returned 0, even when a pod had a valid status reason (such as "Evicted", "NodeLost", etc.), leading to inaccurate monitoring data. Accurately reporting these values is crucial for diagnosing pod behavior and overall cluster health.

How does this change affect the cardinality of KSM:
It does not change the cardinality. The update only adjusts the value calculation for an existing metric family, so no new labels or metric series are introduced.

Which issue(s) this PR fixes:
Fixes #2612

linux-foundation-easycla · 2025-04-01T03:58:12Z

The committers listed above are authorized under a signed CLA.

✅ login: carlosmorenokm1 / name: Carlos Moreno (518db3d, 05187b4)

k8s-ci-robot · 2025-04-01T03:58:15Z

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-04-01T03:58:16Z

Welcome @carlosmorenokm1!

It looks like this is your first PR to kubernetes/kube-state-metrics 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-state-metrics has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

CatherineF-dev · 2025-04-01T10:05:34Z

internal/store/pod.go

+	for _, cond := range p.Status.Conditions {
+		if cond.Reason == reason {
+			return 1
+		}
+	}


Should we only care about the last condition? If so, do we need to remove this part?

No, it's necessary to iterate through all the conditions because the reason may be in any of them.

Will it be a stale condition?

it will not be a stale condition. Kubernetes regularly updates Pod conditions, so if a condition with the corresponding reason is found, it is assumed to be current. If a stale condition were detected, that would indicate an issue in Kubernetes, not in this logic.

Will a pod have multiple different reasons?

Yes, a Pod can have different “Reasons” throughout its lifecycle. Each event or change in the Pod’s state (for example, container creation, image pulling, runtime errors, restarts, etc.) can trigger a different reason. In Kubernetes, these “Reasons” are recorded at different points in the Pod’s lifecycle, so it is entirely possible for a single Pod to go through multiple different “Reasons” as it transitions between states.

Yes, I was thinking the case where the pod status is failed to image, then runtime errors, then restart.

Will the above metric have all of these three status?

Yes, if your Pod transitions through those states (e.g., failed to pull image, runtime errors, then restarts), the metric can capture each corresponding reason at the time it occurs. However, you won’t necessarily see all reasons simultaneously; rather, you’ll see them reflected as changes in the metric over the Pod’s lifecycle.

CatherineF-dev · 2025-04-01T10:05:55Z

Could we update the title to be "fix: report correct reason in kube_pod_status_reason metric"

internal/store/pod.go

…sts)

rexagod · 2025-05-07T21:45:42Z

Hello, thank you for the PR. Regarding the second and third conditions being added here, do we see those fields exhibiting the values in podStatusReasons in real life? I did a quick k/k search and couldn't find an overlap for the field in the second condition, but I could be missing something. Happy to merge if that's not the case.

This change fixes an issue where the metric always returned 0, even when a pod had a valid status reason (such as "Evicted", "NodeLost", etc.), leading to inaccurate monitoring data.

Not sure I get this, if the .Status.Reason is within the podStatusReasons set (valid), wouldn't this work as expected, or does this point towards the lack of fields being covered in the existing version, which are being added in this patch?

if p.Status.Reason == reason {
	metric.Value = boolFloat64(true)
} else {
	metric.Value = boolFloat64(false)
}

rexagod · 2025-05-12T08:41:44Z

To continue my aforementioned point, I believe we need to expand on the set of values podStatusReasons has.

carlosmorenokm1 · 2025-05-16T22:22:37Z

Hi @rexagod,

In real-world environments we’ve observed that not all valid reasons show up in p.Status.Reason. For example:

When a container terminates due to OOMKilled or Completed, those values only live in containerStatus.State.Terminated.Reason, and p.Status.Reason ends up empty.

Some reasons like BackOff or image-pull errors are only recorded in pod.Status.Conditions[i].Reason.

If we only checked p.Status.Reason, the metric would still return 0 in those cases. With this patch, we also inspect pod conditions and container termination states, ensuring that any valid reason from podStatusReasons is correctly reflected in kube_pod_status_reason.

rexagod · 2025-05-26T04:00:47Z

It seems that podStatusReasons already includes the additional values that the additional fields being depended on here see in addition to the the existing values that are already present, for e.g., Evicted, NodeLost, and Shutdown (from the original issue).

/lgtm

k8s-ci-robot · 2025-05-26T04:00:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carlosmorenokm1, rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rexagod]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from CatherineF-dev and dgrisonnet April 1, 2025 03:58

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Apr 1, 2025

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 1, 2025

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 1, 2025

fix: report correct values in kube_pod_status_reason metric

518db3d

carlosmorenokm1 force-pushed the fix-pod-status-reason branch from 23f9138 to 518db3d Compare April 1, 2025 04:01

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 1, 2025

CatherineF-dev reviewed Apr 1, 2025

View reviewed changes

carlosmorenokm1 changed the title ~~fix: report correct values in kube_pod_status_reason metric~~ fix: report correct reason in kube_pod_status_reason metric Apr 1, 2025

mrueg reviewed Apr 1, 2025

View reviewed changes

internal/store/pod.go Show resolved Hide resolved

fix: report correct values in kube_pod_status_reason metric (added te…

05187b4

…sts)

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 1, 2025

carlosmorenokm1 mentioned this pull request May 16, 2025

kube_pod_status_reason is 0 for all reasons #2612

Closed

k8s-ci-robot assigned rexagod May 26, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 26, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2025

k8s-ci-robot merged commit 28bf0e8 into kubernetes:main May 26, 2025
12 checks passed

fix: report correct reason in kube_pod_status_reason metric #2644

fix: report correct reason in kube_pod_status_reason metric #2644

Conversation

carlosmorenokm1 commented Apr 1, 2025

Uh oh!

linux-foundation-easycla bot commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Apr 1, 2025

Uh oh!

k8s-ci-robot commented Apr 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CatherineF-dev commented Apr 1, 2025

Uh oh!

Uh oh!

rexagod commented May 7, 2025

Uh oh!

rexagod commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlosmorenokm1 commented May 16, 2025

Uh oh!

rexagod commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Apr 1, 2025 •

edited

Loading

rexagod commented May 12, 2025 •

edited

Loading

rexagod commented May 26, 2025 •

edited

Loading